Conversation
Lua strings are not UTF-8 and so it is incorrect to use Rust strings. In particular, some Lua libraries specifically designed to add support for Unicode, like ustring, use `string.find` to look for non-ASCII characters in Lua strings, which is impossible to do when Lua strings are treated as UTF-8 (a character set of `[\x80-\xff]` is ill-formed). This is an API-breaking change. Note that downstream consumers like piccolo will now convert using `IntoValue for Vec<T>` instead of `IntoValue for StdString` and so those need to be changed to call `ctx.intern` explicitly instead of relying on the auto-conversion.
PIL 20.2: An upper case version of any of those classes represents the complement of the class.
Because this changes the signature of an exported function to remove an unnecessary `Result` and a struct to eliminate an unnecessary `Box`, this patch contains API-breaking changes.
This makes the replacement method agnostic to whatever type the consumer is using instead of requiring the table to be collected into a `HashMap`. This is an API-breaking change.
Lua 5.4 § 6.4.1: > As a special case, the capture () captures the current string > position (a number). For instance, if we apply the pattern > "()aa()" on the string "flaaap", there will be two captures: 3 > and 5.
|
Hello! I will definitely review everything in the next two days and provide comments. Thank you so much! |
This is necessary for interleaving VM function calls where a function callback cannot complete synchronously, without forcing the normal API to be async.
|
Thanks! After everything, it was also necessary to add interruptible gsub so I pushed a change for that. At the same time I noticed a couple places where I forgot to add Eq markers. I’m definitely, probably, done now, pending any feedback. I also have completed working implementation of these methods for the piccolo VM stdlib now; if you want me to put those somewhere or to send something upstream to supersede your PR, let me know. |
Closely associated types can be stored in the same module. This reduces the amount of re-exporting that has to happen to get everything out, reduces file switching during development, and may decrease the chance of creating confusing type name conflicts.
Due to current non-conformance in the engine, this causes tests to fail.
It is not possible for the lexer to know the nesting level of sets since the pattern `[]]` is valid. There were bad tests in the lsonar test suite which did not conform to PUC-Lua which have been changed for conformance.
This represents the null character and is tested in the PUC-Lua test suite.
6.4.1: > %x: (where x is any non-alphanumeric character)
For example, anchors.
6.4.1: > The beginning and the end of the subject are handled as if they > were the character '\0'.
6.4.1: > A pattern item can be […] a single character class followed by > '*' […] Captures, balances, and frontiers are not character classes.
The previous design, while beautiful, was inefficient and not capable of correctly handling non-greedy matching inside a tree node. An example of this is with the common pattern for trimming strings, `^%s*(.-)%s*$`. A non-greedy match needs to take one token per iteration until the *entire pattern* fails to parse, but this is not possible when only a subtree is visible to the repetition function. Trying to deal with this by naïve linearisation means that the edge of the capture is lost. I imagine that there is some technique that would allow this to work (I am no parser expert) but the thing about Lua patterns is that they are way too simple for this level of complexity in the first place. So, this commit deletes the old engine and replaces it with a hand-translation of the PUC-Lua pattern matching code which works and passes the Lua test suite.
|
Hi again. This PR has totally changed because somehow it took me way too long to see that this library as written was not actually matching according to actual Lua language? I’m very confused about what source was used to decide what Lua patterns look like. A lot of time was wasted. For posterity, I retained the historical attempts to change the old design to actually conform, but it was impossible due to the broken non-greedy backtracking. As an academic point I would enjoy knowing whether there is some way to actually make the old design not be broken in this way without making it even more inefficient, but in real life this new one will work better. Since it does not create an IR, it is substantially more efficient. It could be touched up to be more Rust-y, but I’ve spent too much time on this already. Best, |
Hello,
This PR contains the previous three PRs that I opened separately, plus additional patches. Changes include:
HashMapin the API with a simple callback function so the table type is agnostic to the consumerOptionsThe edition and MSRV are bumped to use the stabilised if-and-while-let chain syntax. This isn’t strictly necessary but it makes the code cleaner in areas and I don’t know why not to do this.
The error handling is not quite as good as it could be, but probably more than good enough, given the scope of the library. In particular, there are redundant variants for unexpected tokens because I ran out of interest in continuing to work on it. syn’s
Lookahead1would be a good way to collect all possible expected tokens into something that can then be collected into an error.I would recommend also to convert the internal engine tests (in
src/engine/tests.rs) to ones that check the public APIs.I suppose this should make the library more or less 1.0-ready, since the API is well-defined and limited, but I only bumped the pre-release number due to the many API-breaking changes.
OK, that‘s it from me for now. I will be able to answer questions or maybe make minor changes but don’t anticipate working on this any more from now (unless I run into some other bug).
Thanks!